ctran: enable per-block flag for broadcast with tcpdm/unpack by fomichev · Pull Request #2079 · meta-pytorch/torchcomms

fomichev · 2026-04-15T02:46:54Z

Summary:
The TCPDM broadcast kernel (ncclKernelBroadcast<UNPACK=true>) hangs
with multiple GPUs because all 8 CUDA thread blocks share a single
kernel flag (flag[0]), creating a race condition on termination.

Send/recv kernels don't have this problem because they use per-block
flags (flag[blockIdx.x]): each block signals and terminates on its
own flag slot, so block 0 clearing its slot doesn't affect other blocks.

Relevant for TCPDM only because we have more than one block (due to unpack).

Reviewed By: function47

Differential Revision: D99452874

Summary: I know this is not something that's gonna be supported for long or even used in prod, but it's working with a minimal code changes, so re-enabled v2.29 for conda with iter builds. 901 + ncclx + tcpdm, conda package: 0f473b3 Differential Revision: D100339239

Summary: The TCPDM broadcast kernel (ncclKernelBroadcast<UNPACK=true>) hangs with multiple GPUs because all 8 CUDA thread blocks share a single kernel flag (flag[0]), creating a race condition on termination. Send/recv kernels don't have this problem because they use per-block flags (flag[blockIdx.x]): each block signals and terminates on its own flag slot, so block 0 clearing its slot doesn't affect other blocks. Relevant for TCPDM only because we have more than one block (due to unpack). Reviewed By: function47 Differential Revision: D99452874

meta-codesync · 2026-04-15T02:47:26Z

@fomichev has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99452874.

meta-codesync · 2026-04-16T17:09:24Z

This pull request has been merged in a5eae6f.

Stanislav Fomichev and others added 2 commits April 14, 2026 08:22

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 15, 2026

meta-codesync bot added fb-exported meta-exported labels Apr 15, 2026

meta-codesync bot closed this in a5eae6f Apr 16, 2026

facebook-github-tools bot added the Merged label Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ctran: enable per-block flag for broadcast with tcpdm/unpack#2079

ctran: enable per-block flag for broadcast with tcpdm/unpack#2079
fomichev wants to merge 2 commits intometa-pytorch:mainfrom
fomichev:export-D99452874

fomichev commented Apr 15, 2026

Uh oh!

meta-codesync bot commented Apr 15, 2026

Uh oh!

meta-codesync bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fomichev commented Apr 15, 2026

Uh oh!

meta-codesync bot commented Apr 15, 2026

Uh oh!

meta-codesync bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant